Individual Poster Page

See copyright notice at the bottom of this page.

List of All Posters

 


How are Runs Really Created

August 13, 2002 - Michael Humphreys

How is what's been described any different from a Lindsey-Palmer "change-in-state" model based upon "base-out" run expectation states that are customized per player, per team or per league? See "Curve Ball," the recent book by Jim Albert and Jay Bennett. I'm looking forward to the next installment of your article, but it seems that if you customize a model for every possible situation, it no longer has the conceptual simplicity and practical applicability of a model, and is merely a highly intricate description.


How are Runs Really Created

August 14, 2002 - Michael Humphreys

Thanks for your response to my question, which you answered. You are absolutely right that run-expectancies vary t r e m e n d o u s l y based upon who the pitcher is, who his fielders are, and who is coming up behind the batter. In addition, I believe one of the postings had a good point that w i n-expectancies also vary tremendously depending on the inning and the amount of a lead. With computers, it is now possible to calculate the extent to which a batter increases or decreases run (or win) expectancy each time he comes to the plate, and any such evaluation is clearly the most accurate. The point I was trying to make is that the computational complexity of the model, which is a cinch for a computer, is overwhelming for a human being, and the rating generated therefore has a sort of "black box" quality to the average fan. Simple formulas such as OPS or Palmer linear weights can be apprehended more easily and probably, over the long haul, tend to provide values that approach the virtually perfect values determined under your model. Thanks again.


How are Runs Really Created

August 14, 2002 - Michael Humphreys

Tangotiger--Thanks again for your response. Another question. I've begun to wonder recently whether OPS, Runs Created and Linear Weights falsely inflate the impact of an exceptional player, such as Bonds, not only for the (correct) reasons you cite, but also because such formulas effectively treat the offensive events of *one* player as if they were "spread around" the whole batting order. In other words, something tells me that if Barry could "give" his exceptional marginal performance to each of his teammates, SF would score a lot more runs. A regression analysis I recently did of 1999-2001 major league team data team resulted in a formula that, although generally consistent with Linear Weights, yielded a projected run total for San Francisco that was *much* higher than the number of runs they actually scored--the error might have been as much as 40 or 50 runs. Do you have the same suspicion about Barry's *real* marginal contribution that I have? I have the feeling that your model could help us answer that question.


How are Runs Really Created

August 14, 2002 - Michael Humphreys

Thanks, Tangotiger. The way I was thinking about the question, I wonder what would be the difference in team runs scored if one ran the following two types of simulations: first, simulations with Barry and a lineup of 8 guys each with an average OBP and slugging percentage, then, second, simulations with *nine* guys *each* of whom has an OBP and a slugging percentage equal to the *weighted* average in the first simulation. I think my question was less whether Barry has the same impact with a good or bad team, but rather whether "Barry Plus" eight average guys is less productive than a "Barry Blend" of nine guys each with a bit of Barry's marginal magic. Very interested to see what you think. Thanks.


How are Runs Really Created

August 15, 2002 - Michael Humphreys

Thanks, Tangotiger. Now I get it.


Pitch Counts, estimated (August 8, 2003)

Discussion Thread

Posted 2:10 p.m., August 13, 2003 (#2) - Michael Humphreys
  Tango,

Another nascent piece of conventional wisdom throttled in its crib!

I've been reading too much Bill James. (To be fair, James is the first to admit when his metaphors are grotesque and grotesquely over-extended. And they're usually fun to read--a true guilty pleasure.)

Speaking of James, he writes in the TNBJHBA that those 1970s pitchers who pitched so many innings seemed to have long careers anyway.

My two cents, for what it's worth, is that there seems to be very little evidence that high pitch counts or high numbers of innings pitched hurt pitchers who are neither young (under 25) nor old (over 31 or 32). This would imply that pitchers in their prime who are at least not bad should be allowed to pitch more, so that poor pitchers aren't given too many opportunities to screw up.

As you know, I've been doing a lot of historical fielding analysis, and have had occasion to look at a lot of baseball-prospectus "team" pages between 1974 and 2001. It's shocking to see how the number of pitchers per team has risen over that time period. It cannot be a good strategy to dip that low into the pitching talent pool. The fact that teams *have* been doing so may explain the phenomenon--especially apparent in the 1990s--of the high number of pitchers who have been *historically* dominant on a "rate" basis compared to "league average" pitching. Think Pedro, The Big Unit. The diluted pitching talent pool may also partly explain the hitting explosion over the years, though obviously the trends toward smaller parks, much more muscular ballplayers, thin-handled bats, etc., etc., are probably more relevant.

Thanks, as ever, for some truly informative sabermetrics.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 3:51 p.m., September 12, 2003 (#8) - Michael Humphreys
  Tango,

Thanks for posting Patriot's article. Patriot mentions John Jarvis' excellent survey of approximately 30 or so run-estimators.

I agree that simulators, when the data are available, are generally better than simple regression analyses.

The book "Curve Ball", by two former Chairs of the Sports Section of the American Statistical Association, has an excellent explanation of the weaknesses, and strengths, of regression analysis as applied to baseball offensive statistics. Curve Ball points out that the regression weights for doubles tend to be too low, and the weights for home runs slightly too high, probably because of the cross-correlation between both variables. That being said, I don't think any player would be significantly mis-rated if the regression weights were used instead of the simulator weights.

For example, let's say you've got a high doubles/moderate home run guy like Kirby Puckett, say 40 doubles and 20 homers, and a lower doubles, higher homers guy like Mickey Mantle, say 20 doubles and 40 homers. If you use the Curve Ball weight for doubles (.67) and homers (1.5), the doubles/homers runs for Kirby are 57. Using the Pete Palmer linear weights (.80/1.40), the Kirby doubles/homers runs are 60. For "Mickey", the regression runs are 73; the Palmer runs are 72.

The SB/CS weights have more significant errors, for reasons explained in Curve Ball, but even using Mitchell's UZR weights, a Rickey Henderson wasn't creating more than a dozen runs or so in a typical season with his basestealing.

Curve Ball directly addresses the important issue you've identified of *validating* regression weights derived from Sample A seasons by applying them to Sample B seasons. Curve Ball found the regression weights from a 1954-1999 sample worked well "out of sample". See pages 181-82. In my pitching and fielding rating system, which uses regression analysis, the weights derived from various sub-samples of the data were virtually indistinguishable, except the weight for outs, which moved in sync with Pete Palmer's and Mitchell Lichtman's results. Major league baseball is a remarkably stable "system".

I think the larger point Patriot makes is worth making: there are many offensive formulas based on counting stats that are well-developed and reliable. I think the next major advances for offensive evaluation will build upon your work on Win Expectancy; i.e., the actual change in expected runs and wins created by a player, based upon his actual plate-appearance-by-plate-appearance data. One of the co-authors of Curve Ball, Jim Albert, has run a similar model for run-creation using 1987 Retrosheet data. The play-by-play system, customized per player, did reveal some differences not captured by aggregate data, but even the highly simplistic and flawed OPS statistic had a very strong straight-line relationship to the customized runs created data. The other issue in using PA-by-PA data is whether it captures real player skills in maximizing their positive impact, given their base-out scenario opportunities, or just random *measured* contextual impact. It's a subtle and more complete analysis of the old "clutch hitting" question.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 4:19 p.m., September 12, 2003 (#10) - Michael Humphreys
  Tango,

Thanks for the link to Tom's article. It doesn't provide an overall evaluation of the difference between "counting stat" run estimates and PA-by-PA run estimates. I'll write to Tom, but do you know offhand the improvement Tom's system makes in terms of long-term career assessments?


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 9:02 p.m., September 12, 2003 (#11) - Michael Humphreys
  Tango,

I had an idea that might be interesting to evaluate while you are developing your Win Expectancy model. Your article on your website persuaded me that "Runs Produced" (R + RBI *minus* HR) is a better "quick and dirty" evaluation statistic for batters than R+RBI. Runs Produced obviously reflects the contextual opportunities of the batter, his own "run-creating" production, and his clutch "performance". But it's a great stat for treating different kinds of batters fairly: on-base guys and power guys.

Do you think that some sort of weighted average of Runs Produced (perhaps scaled to league-average based on PA or outs) and BaseRuns (all of which use counting stats) might have a good match with a PA-by-PA "runs value added" (the last step before Win Expectancy)?

Also, not to beat a dead horse, but I did some quick calculations regarding the potential scale of error if one were to use regression-based SB/CS weights instead of the (correct) BaseRuns or UZR weights. I looked up Rickey Henderson's career SB/CS data. If you apply the regression weights from Patriot (+.211 per SB; -.262 per CS), Rickey's career stolen base runs is +208. I believe the BaseRuns weights, derived from 1974-1990 data, are +.193 per SB and -.437 per CS. If you apply those weights, that makes +124 runs for Rickey. Now 84 runs a large number, although the difference occurs over the course of the equivalent of 19 162-game seasons, so that's 4.4 runs/162 game season. And Rickey is the *most* extreme case.

Another way to see it is that the SB runs are roughly equivalent, but the CS run weights differ by just slightly less than .2 runs. The most CS Rickey ever had in one season was about 40 (when he set the record for SB). So that's an 8-run error; not nothing, but that is the most extreme single season outcome I can think of.

UZR weights differ slightly more, but I think that's because they're derived from late 1990s/early 2000s seasons, when outs were more costly than in 1974-1990.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 11:24 p.m., September 12, 2003 (#14) - Michael Humphreys
  Robert,

Yes, we should use the best methods available--when they are available. I just wanted to clarify that regression analysis, used carefully, is a good tool that provides good estimates. I believe that we have the necessary data to derive "change-in-state" results or to run simulations for offense throughout baseball history, or that at least Pete Palmer has it. But when we don't have it, regression analysis is a good back-up; indeed, sometimes it can reveal new relationships.

Sometimes these relationships are spurious and misleading. As explained in "Curve Ball", the fact that regression analysis only provides a measure of a statistical "association" between discrete offensive events and runs scored means that if you include Sac Flies in the regression, they get *way* overweighted, because Sac Flies "carry" information about the run-scoring context in which they occur: you generally won't have a lot of Sac Flies unless you have a lot of runners on third, and having a lot of runners on third is statistically associated with scoring more runs, whether or not they come home via a Sac Fly.

Sometimes, however, you can *apply* regression analysis to *limit* regression analysis errors of the Sac Fly kind, and reveal new information.

One thing I've been meaning to do is a regression analysis on the factors that are associated with stolen bases; i.e., how do team strikeouts, walks, homeruns, etc., impact, on average, the number of bases a team steals and the number of runners caught stealing.

It would make sense, given what we know about the game, that more strikeouts would be associated (through regression analysis) with more stolen bases (due to longer pitch counts), more walks would be associated with more stolen bases (due to more runners on base and longer pitch counts), more homeruns would be associated with *fewer* stolen bases (since teams with power would not risk the out, and homeruns clear the bases).

After determining the number of stolen bases per team *not* explained by these factors, you would have a better idea of the *context-adjusted* stolen bases by a team. That might form the basis for more refined estimates of the "real" stolen base contribution of a player, given the stolen base context of his team.

Furthermore, if such "context-adjusted" stolen base numbers were used instead of plain stolen base numbers in a regression analysis of team runs scored "onto" walks, singles, doubles, etc., you would get an estimate of the number of runs associated with SB/CS *after* taking into account the impact of factors that supercede or "control" stolen bases. In other words, you could find out the number of runs associated with stolen bases not accounted for by the "context" in which stolen base attempts tend to occur.

We might find that that run value has a surprising weight. We might not. Even if we didn't, we might learn something that would help our simulator models.

We need to remember that change-in-state data and simulators *also* work off of averages. They tell us the *average* weight of a stolen base, given an offense that is average in all respects. If we discover, through regression analysis, that a lot of stolen base numbers are explained by the contexts I've described, perhaps it would support refining the *simulator* to answer the question: "What is the change in run expectation when there is a stolen base given a high strikeout/walk/homerun context?" We might not find anything. But regression analysis is an easy-to-use method for discovering new relationships, which can then be tested using simulation models or more detailed play-by-play data.

This is precisely the tack taken in Curve Ball, which first introduces the idea of a batting Linear Weights equation through regression analysis, identifies the Sac Fly "carrier" problem, and then uses Lindsay-Palmer change-in-state models to get at the best answer.

I suppose the other point I was trying to make is that all of the "counting stat" models for offense are so similar in their degree of accuracy (including, yes, simple regression analysis) that it is time to take the next step, as Tango is doing, to try to get "runs value added" or Win Expectancy systems developed. The Jim Albert system I mentioned has something like *half* the standard error as the counting stat models. *That* is the kind of big improvement that is best repays the extra effort of using much more complicated data sets and analytical techniques.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 12:44 a.m., September 13, 2003 (#15) - Michael Humphreys
  Tango,

I'm heading out for the rest of the weekend, but I just saw your post. You're right. The BaseRuns "absolute" weight for CS is -.28, not -.44. Patriot's regression result is also on an absolute scale. His CS weight is -.26. (The SB weights are .19 and .21, respectively.)

Patriot's post #3 was indeed good stuff. The regression model standard error using the whole sample was 22.6. When he re-ran the regression on half of the seasons and applied the weights to the other half of the sample, the standard error went up by only 1.1 runs, to 23.7 runs per team per season.


Accuracy of Run Estimators (September 12, 2003)

Discussion Thread

Posted 11:55 p.m., September 16, 2003 (#21) - Michael Humphreys
  Tango,

I followed you link above and thought the following comment of yours was worth quoting here:

"When looking at exteme cases, that is those cases where EqR and xR and RC are limited to and fail at, you might unearth say an extra 10 runs in there. 10 runs = 1 win. And a team will pay 2 million$ / win.

"So, in the case of say Delgado, you really need the absolute best most precise measure you can get. Even BsR by itself is not enough. You'd need to create a sim model that will show exactly how Delgado will interact, or expect to interact, with his teammates, and the impact of Delgado specifically. You might also find that putting him in the #2 spot will further add more runs and wins to the team.

"Sign him for 5 years, and you now need an "aging" model, similar to the (unverified but seemingly sharp) PECOTA. Add all that up, and you can find say 10 million$ of value in a player. You can make a case say that while the market may value Delgado at 68 million over the next 4 years, you might figure out that he's really worth 52 million$. (Say like what the A's did with Giambi.)

"Use the right tool for the right job. And, for the casual fan, OPS is fine. For the more devoted fans, EqR, xR, RC, and Win Shares are fine too. But, don't try selling these things as more than they are. Limits. That's what it's about.

I thought this was spot on. Using more sophisticated techniques has potentially greater value when you're dealing with outliers--both because the better models get a better answer and because the financial relevance is much greater.

I also re-read your "How Runs Are Really Created" article and agree that BsR is truly a *model* of run scoring--not just a statistical "match" with typical run-scoring environments.

I'm very much sold on the relevance of BsR for extreme pitchers, who create their own run-context. Have you done any analyses that show how BsR improves the valuations for the Gagnes and Pedros? Also, have you found some hitters that have been significantly and persistently misvalued under "linear" formulas? Say, Bonds, or, going further back, Williams v. Ruth?


Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 1:06 a.m., September 18, 2003 (#1) - Michael Humphreys
  Excellent article--maybe the best on replacement value that I've seen, and certainly the most balanced and comprehensive.

I ultimately agree with Patriot that it's best to provide two or three baselines. For position players, perhaps .500, .425 and .350 would work. I'd actually prefer a "rate" metric based on runs (including defensive runs) per 162 games, because it's easier to translate into wins/losses. For people putting together reference books, aside from the massive work in figuring out the levels, it would only involve adding an extra column or two to ratings: e.g., TPR at .500 level; TPR at .425 level, etc.

For making career ratings of all-time greats, I don't think it's generally a good idea to compare them to the .350 level, because no team would allow a player to stay for any length of time at that level. The Tango/Nate Silver models are neat.

For major league organizations trying to build an actual team of 25 players, half of whom won't play that much, multiple levels make sense. Teams don't have eight positions and pitcher: they have eight full-time positions, utility players needed in reserve, five starting pitchers, a closer or two, some set-up pitchers, etc. To be simplistic about it, there are roles for role players and different pools of replacement talent for each role. If your team needs to fill a role, you look at the population of players who can fill it and pick the best one. You never go out and get a .350 utility infielder to replace your starting shortstop. You may have to accept using one for a while, but you won't for long.

Chaining is so hard to model, especially for pitchers. Again, would one include mop-up guys in the "replacement pool" if you lost your #2 starter? No, you'd probably work your #1, 3, 4 and 5 guys more. It's a real mess, and I don't know how to model it. Unfortunately, it really has a great deal of relevance in Cy Young awards and in putting together good career ratings.


TheStar.com - Analyze this: NBA '04 (September 19, 2003)

Discussion Thread

Posted 3:04 p.m., September 19, 2003 (#4) - Michael Humphreys
  Sometime in the past year the New York Times had an article about a basketball evaluation system that sounds like the +/- system alluded to in the posted article. The +/- system seems to evaluate each combination of 5 players that has been on the court and compares how that combination does compared to the others. How that might be translated into individual ratings is beyond me, although it might be something as simple as comparing the mean scores of *four* player combinations with and without the player being rated, and some adapation of a t-test for scale and statistical significance of the player's impact. The limitation of that approach might be that it only captures the interactive impact of a player with that particular combination (or all four-player combinations for that team), not the player's context-independent skill in a "average" environment. The other problems might be that you would probably want to adjust ratings for each combination for opponent quality and home/road impact, which is very signficant in basketball. The sample sizes could get quite small very quickly.

Basketball has at least some counting stats that are meaningful, and people might already have a good idea of the average opportunity cost of a missed 3-point shot and missed 2-point shot, as well as the average value of a rebound. I hadn't realized the subjectiveness of credit for assists. It makes sense that their might be an offsetting cost to high "steal" rates.

Has anyone tried something similar to the +/- system for soccer? The scores in soccer are so low that it might be difficult. I would imagine that Tango's approach for rating hockey goalies might be transferable to soccer goalies.

Since football season is here, does anyone have any suggestions for good "sabermetric" books on football?


Most pitches / game in a season (September 22, 2003)

Discussion Thread

Posted 9:17 p.m., September 22, 2003 (#4) - Michael Humphreys
  Tango,

Interesting list. Surprised that there haven't been more such seasons since 1919. Very few 1990s seasons. Have you reached any tentative conclusions regarding the damage, if any, caused by high single-season pitch counts? Just looking at the list, it's hard to say there is any immediately obvious impact. I guess what one would have to do is conduct some sort of time-series test to see if there is a statistically significant decline in ERA+ or DIPS performance one, two or three years after a 120+ season.


Pitch Type and Count May Increase Risk of Elbow and Shoulder Pain in Youth Baseball Pitchers (September 27, 2003)

Discussion Thread

Posted 1:30 p.m., September 28, 2003 (#2) - Michael Humphreys
  Tango,

This article, which to some degree helps quantify conventional wisdom regarding the damage caused by throwing breaking pitches, at least for very young pitchers, reminded me of any idea I had regarding the pitcher with the greatest longevity in baseball history: Satchel Paige.

There's a great essay about him in Bill James' New Historical Abstract (see page 193). Satchel conspicously "lacked" a good breaking pitch. But maybe Satchel figured out that by *avoiding* the use of breaking pitches, he could extend his career, which was by far the longest in the history of organized baseball. As usual, Bill says it best:

"So what you have, in Satchel Paige, is a great fastball, great control, a tremendous change, a great understanding of how to pitch, intelligence, determination, absolute composure--and a forty year career."

I'll quote from the posted article immediately below. Note that throwing change-ups actually *reduces* the risk of injury!

"Overall, almost 15% of all pitching appearances resulted in elbow or shoulder pain. The slider was found to have a significant relationship to elbow pain (86% increased risk), and use of the curveball accounted for a 52% increased risk of shoulder pain for the population studied. Use of the change-up pitch was associated with a 12% reduction in the risk of elbow pain and a 29% reduction in the risk of shoulder pain."

It's still a fair question whether a major league pitcher today could get away with avoiding throwing breaking pitches. As Bill has written, most strikeouts are on breaking pitches, and you almost always need to be above-average in strikeouts to have a major league career of any length.

Do we know of any starting pitchers who conspicuously limit the number of breaking pitches they throw? Does anybody today try to rely on the fastball, change and location?


Pitch Type and Count May Increase Risk of Elbow and Shoulder Pain in Youth Baseball Pitchers (September 27, 2003)

Discussion Thread

Posted 8:32 p.m., September 28, 2003 (#5) - Michael Humphreys
  David,

Thanks for the data. So Maddux might be throwing 76.3 + 19.4 = 95.7% non-breaking pitches? The relatively small number of breaking pitches would be consistent with his modest strikeout rates.

I've gotten the general impression that Maddux has developed a reputation for practically "asking" to get pulled from games after only six innings or so. Perhaps his pitch selection and conservatism about going deep into games might be part of a strategy to increase the chance that he'll be able to pitch for many more years.

Interesting that there isn't more overlap between the lists. I suppose that pitchers tend to rely more on change-ups when they *don't* have big fastballs, in order to make the fastballs look faster. What might be unusual about Paige is that he had a blazing fastball *and* a great change.


Pitch Type and Count May Increase Risk of Elbow and Shoulder Pain in Youth Baseball Pitchers (September 27, 2003)

Discussion Thread

Posted 11:21 a.m., September 29, 2003 (#7) - Michael Humphreys
  Tango,

Yes, Maddux's reputation for not staying in games past the first six innings or so was (a) only recently acquired and (b) just that, a reputation, or something I thought I had been hearing of, not a proven fact. Judging from your estimates, it looks as though his per-game pitch count dropped slightly only last year.

When I said his strikeout rate was "modest", I didn't mean to imply that it was actually below average, just that it wasn't exceptional. As Bill James keeps pointing out, once your strikeout rate is actually below average, you're on your way out.

Thanks for the post.


Evaluating Catchers (October 22, 2003)

Discussion Thread

Posted 9:58 p.m., October 22, 2003 (#4) - Michael Humphreys
  Great work. It sets a new standard for rating catchers.


What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 1:14 a.m., November 13, 2003 (#35) - Michael Humphreys
  Tango and Studes,

For what it's worth, DRA shows Pedro as the best player in all of baseball in 1999, though not quite as valuable as SLWTs values A-Rod and Barry in 2000. During the 1997-2001 period, he was worth comfortably more than 50 runs a season on average.


What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 2:42 p.m., November 13, 2003 (#41) - Michael Humphreys
  ColinM,

Good call. I *was* just looking at gross runs better than league average. Normally the differences between runs saved and added are not great, but when you're talking about Pedro and a dramatic reduction in runs, the difference is probably greater--non-linearity increases as you approach zero runs allowed. In addition, I didn't adjust for park effects, and Fenway is probably still pro-hitter, though not nearly as much as before.

Thanks re: DRA. I haven't focused on the pitching as much, because fielding seemed to be the bigger problem. I just wanted to add my support to the idea, ably expressed here, that pitchers , even today, can be the most valuable players in baseball.


David Pinto and fielding (November 10, 2003)

Discussion Thread

Posted 3:19 p.m., November 13, 2003 (#10) - Michael Humphreys
  Tango,

Not sure this is exactly on point, but I've been in touch with David regarding his system, and I think that one of the potentially valuable things about it is that it may help us better quantify "ball-hogging".

His model tracks the league average rate of out conversion for every batted ball with the same parameters--direction, trajectory (grounder, line drive, fly ball, mayber pop-up), speed, and pitcher-and-batter handedness). The out conversion probability is calculated both at the (league average) team and (league average) position level. The probabilities per position sum to the probability per team. Andruw's data gets included in the league data.

At the end of the year, you take the sum of the centerfielder probabilities of out conversion for every BIP on a team and compare that to number of Andruw's gross putouts. I think Andruw was +24 or something like that.

Here's the neat thing. You could calculate the extent to which Andruw recorded a disproportionate amount of his putouts on BIP that had high out-conversion at the *team* level, because the data exists for each and every BIP he caught. For example, was he getting a lot of putouts on short flies that had a .9 probability of being caught by *somebody* on the team, with the normal distribution of probabilities by position being something like .4 SS, .4 2B, .1 CF?

We might want to look at the Atlanta ratings at 2B and SS. If they're basically OK, the fact that Atlanta as a team had the best defense under David's system would provide good evidence that Andruw really is valuable, and not just taking cheap chances.

The possibility that individual players could increase their totals with discretionary infield and short outfield fly balls may explain why Soriano shows up as basically OK.

This approach of looking at *team* level probability of out conversion shades into another issue that highlights the differences between David's model and UZR. UZR is measuring fielder impact in terms of estimated runs saved; David's system measures outs recorded in excess of probable outs recorded. UZR is an Expected Value measurement; David's is a Probability measurement. When a player records an out on a ball with a very high probability of out-conversion, he does very little in changing the *Expected Value* of runs allowed by his team, as that value is measured the moment the ball leaves the bat. Adapting David's system to the "expectation" concept will permit calculations of runs-saved per fielder.

In the meantime, David's system is an excellent new method that can potentially provide excellent fielder ratings and provide further insight into DIPS.


David Pinto and fielding (November 10, 2003)

Discussion Thread

Posted 10:55 p.m., November 13, 2003 (#12) - Michael Humphreys
  Tango,

Thanks for the example. So I guess UZR could track if Andruw was recording a lot of cheap outs. But maybe we don't care, because he won't get that much "run" credit on those kinds of plays anyway. In other words, if Andruw takes an above-average number of "easy" plays in "shared" zones, but fails to cover an average number of "difficult" plays in "centerfield-only" zones, he effectively gets docked under UZR, which would be correct.


David Pinto and fielding (November 10, 2003)

Discussion Thread

Posted 4:17 p.m., November 15, 2003 (#19) - Michael Humphreys
  Studes,

I could very well be wrong about this (I probably am), but I thought the t-stats tell you how confident you can be about the accuracy (standard error) of the coefficients for the variables, not necessarily the relative impact of each variable on the outcome being modeled.

I'm also not sure the following alternative approach is any better, but it might be worth regressing marginal "pitcher" DER (or marginal outs) onto runs allowed, and then, separately, marginal "fielder" DER (or marginal outs) onto runs allowed, and compare the relative r-squareds. It might also be worthwhile simply comparing the standard deviation in "pitcher" DER outs v. "fielder" DER outs.

I think that we won't be able to draw firm conclusions on the relative effect of pitching and fielding on BIP from David's system until we have at least two years of data with which to perform a "persistency" test at the individual pitcher and fielder level.

If I recall correctly from various Primer posts, the "r" between successive individual UZR fielding ratings (which, in the case of infielders, excludes infield pop-ups) is about .5, which corresponds to an r-squared of about 25%. The "r" for individual pitcher BABIP year-to-year is about .2, corresponding to an r-squared of 4%. Dick Cramer's article about the impact of pitchers on BABIP indicates that, for purposes of comparing the relative impact of pitchers and fielders, the relevant comparison is between the r-squareds. By this measure, using the above r-squareds, fielders have six times the impact of pitchers. Of course, there's a *huge* amount of noise for BIP outcomes, year-to-year. DRA allocates the "noise" to the fielders, who account for most of the "signal".

Mike,

Yes, Oakland's foul territory is vast--I think ballparks.com (which baseball-reference.com posts for each team) says that it is the largest or one of the largest in the majors.

In developing DRA, I did a lot of analysis of infield fly outs, and tried to estimate the impact of foul territory. What I kept coming up with are estimates that Oakland would probably "give" an average fly ball staff an extra 12-15 infield fly outs a year compared with a ballpark with "average" foul territory. That's still an *extremely* rough guess based on a *primitive* "coding" of ballparks.com's verbal descriptions of foul territory. I would not be surprised if the effect were greater.

The regression result between infield fly outs and foul territory also nicely exemplifies the t-stat "accuracy" /r-squared "impact" issue discussed above. The r-squared for foul territory was tiny (less than 5%), but the t-stat was highly significant (.0001).

During the playoffs, there was a short story about how the Red Sox shifted their rotation to get "infield pop up artist" Wakefield to pitch in Oakland, precisely because of the foul territory, and ground ball pitcher Lowe in Boston.

Tango (and David),

Doesn't David's system track not only the "slice", but the trajectory (grounder, line-drive, fly ball, pop-up) and ball speed? Do you think that the trajectory and ball speed variables would serve as proxies for "depth" in the "slice"?

David,

What might be happening to Chavez is that a high number of infield fly balls on the left side of the field would significantly increase the sum of probable out-conversions for an average third-baseman playing the position in an average way. If Eric allowed the shortstop or left fielder to take as many of those discretionary chances as they could possibly handle, with Eric concentrating on the huge foul territory that only he could handle, I would guess that his rating would go down.

As Bill James has written (somehow that sounds like a Scriptural reference), there doesn't seem to be any relationship between third base putouts and fielder skill. I extend the same insight to other infield positions, for various reasons mentioned in the DRA article.

One simple approach to test this theory (and your database is outstandingly well-suited to this) might be to calculate separate ground ball and fly ball "outs-vs.-probable outs" for infielders. Something tells me that Chavez's ground-ball-only rating would go up, and Soriano's would drop from its currently "average" level.


David Pinto and fielding (November 10, 2003)

Discussion Thread

Posted 5:40 p.m., November 15, 2003 (#22) - Michael Humphreys
  Tango,

I think I get your example. So, using the numbers in the example, the lefty should have a -0.07 rating, but somehow under David's system he has a 0.0 rating?

Are the numbers in the example meant to be representative? Do ground balls hit against lefties have lower out-conversion rates than those hit against righties? Is this controlled for batter-handedness?

I think Mike Emeigh wrote in his "Jeter" series that lefties face more righties, and righties of course have a longer distance to run to beat out a throw on a ground ball, so you'd sorta expect the *average* out-conversion on ground balls hit against lefties *not* controlled for batter handedness to be higher. On the other hand, I think Mike had found that when you control for batted-handedness, the effect is partly counterbalanced because ground balls hit to the opposite field have lower-out conversion rates. Or something like that.


Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 3:16 p.m., November 14, 2003 (#9) - Michael Humphreys
  Tango,

Your model is a significant advance over counting stat models, particularly for outliers such as Bonds. It also directly addresses the most difficult issue of pitcher evaluation--the real impact of relief pitchers. I also like the name "Advancements", which concisely captures the temporal feature of the model.

The value for Bonds looks good. Not only do all those IBBs contain his impact, but all of the quasi-IBBs have a similar effect. As Yogi Berra might say, "Barry's not so valuable anymore--he's too good."

One counting stat of yours that I like very much, Runs Produced (R + RBI - HR - (AB/10)), indirectly gets to the same result as well. I think Barry had 117 Runs Produced last year--obviously outstanding, particularly as it was accomplished in only about 130 or so games, but I'm pretty sure that Pujols was more valuable.


Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 6:44 p.m., November 14, 2003 (#14) - Michael Humphreys
  Tango,

I've been tinkering with Runs Produced by Bonds over the course of his career, and it really helps discount the last three or four years. From 1990-2003, Bonds has been consistently in the 130-150 range, except for injury or strike-shortened seasons. 2001-2002 were *not* his best years on a "gross value" basis (by this measure), though on a Runs Produced/27 outs basis, 2002 was clearly his best year--though not freakishly out of line.

Have you done a BaseRuns analysis for Bonds? I tried going to your articles, but I couldn't be sure I'd be doing it right. As we've discussed, I believe that BaseRuns would do a better job with an outlier like Bonds. If Win Advancements shows that BaseRuns is better than Linear Weights (or the new, "linear" Runs Created), you'd have more evidence of the value of the BaseRuns approach for years we don't have PBP data.

If you have the latest version of the BaseRuns formula, it would be interesting to check Barry's 1999-2002 numbers. Also, if I understand it correctly, BaseRuns provides a "gross" runs number. Can the formula be adjusted to yield runs above average?


Persistency of reverse Park splits (November 20, 2003)

Discussion Thread

Posted 11:19 p.m., November 20, 2003 (#5) - Michael Humphreys
  Studes,

I assume that El Sid did much better at Shea. That would make a lot of sense. He was probably one of the most extreme fly ball pitchers of all time (I think he once pitched a game with no ground balls), and Shea had one of the largest foul territories when he was pitching. (They've since added new seats in what used to be foul territory.)

Another factor might be visibility. El Sid struck out his share of batters, and the haze at Shea (aggravated by "track" lighting) probably helped him (and Seaver too).


Baseball Player Values (November 22, 2003)

Discussion Thread

Posted 3:52 p.m., November 23, 2003 (#8) - Michael Humphreys
  Great stuff.

I think the Ricky Henderson result is very interesting. In one of his Abstracts, Bill James ran some simulations--perhaps based on run rather than win expectations--to compare the impact of Rickey with Mays, and Rickey turned out to be surprisingly close in value to Mays, possibly even better.

The Win Expectation approach might provide more evidence of the greater importance of on-base percentage compared with slugging percentage, and, even more importantly, the effect of basestealing on Win Expectation. I think Ricky's "Wins" from basestealing *might* be a little higher than his Linear Weight basestealing runs/wins if you apply UZR or Tango's run weights. Perhaps Rickey was really stealing extra bases when they had the most impact on getting that one run needed for a win.

Another interesting result is that Brett came out better than Schmidt. Perhaps Schmidt received more low-Win-value quasi-IBBs than we realized.

Finally, I agree with the other posters that we need to find a way to factor in fielding for evaluating pitchers. Most pitchers with long careers have, on average, average fielders behind them. However, Maddux's rating is almost certainly improved to a non-trivial degree by Atlanta's fielding, which was the best overall in the '90s. Similarly, Palmer benefitted from outstanding Oriole fielding.


Baseball Player Values (November 22, 2003)

Discussion Thread

Posted 3:37 p.m., November 24, 2003 (#15) - Michael Humphreys
  Tango,

This is a quick, though much "dirtier" estimate. Some major caveats also apply. In spite of all that, your estimate is amazingly similar to mine.

Maddux pitched effectively full-time from 1988 through 1992 in Chicago, and, for the years covered by DRA, from 1993 through 2001 in Atlanta.

The Chicago defense was essentially average while Greg pitched for them: -16 runs over the course of 5 seasons.

The Atlanta defense was--with one important qualification I'll address below--outstanding, the best sustained team fielding from 1974-2001 by far. Over those 9 seasons, Atlanta never had a negative rating, and only 1 rating below +33 runs saved (1994, +9). The total runs saved over those 9 seasons was +515, or +57 per season.

The per-season errors in Atlanta DRA pitching and fielding runs minus actual runs allowed were: -14, +5, -20, +12, +1, -4, -4, -6, -20. If anything, DRA very slightly underestimates the overall effectiveness of Atlanta's pitching and defense.

Greg has averaged 235 IP per 162 games over the course of his entire career. Assuming average team IP are 1445, he pitched about 16% of his team's innings. 16% of +499 team runs saved (Chi plus Atl) is 80 runs to Greg's benefit, or about 6 runs per each of Greg's 14 full-time seasons.

Now the caveats.

You're right that we should measure by BIP, not IP. We should take Greg's percentage of team BIP, season-by-season, and multiply that by the team fielding rating that season. Even though he is not known as a strikeout pitcher, he almost certainly had above-average strikeout rates and less than 16% of his team's BIP.

As mentioned in the DRA article, the biggest "kink" in the DRA system is that CF putouts and estimated infield fly outs ("IFO") have a lot of overlap, and CF putouts are assigned to team fielding, wherears IFO are assigned to pitchers. Atlanta's IFO have been very, very low. I suggest that it's probably a good idea to allocate 10 runs or so of "Andruw's" per-season runs to IFO, which "belong" to pitchers. (That's four seasons, or 40 runs.) On the other hand, Atlanta had one very low IFO number before Andruw joined the team, and Atlanta also has very high team *pitcher* fielding ratings (assists), which Charlie Saeger has observed are a good proxy for ground ball pitching. Anyway, let's haircut Atlanta's fielding runs by about +40, to about 53 per season, which is probably still a little too high. Take 16% a season for Greg, and the *most* he could be getting subsidized--by probably the best fielding team of the past quarter century--and that's 8.5 runs a season. Since Greg has more Gold Gloves than any other pitcher in history--and Atlanta's team pitcher fielding ratings have been outstanding--I wouldn't be surprised if at least a few of those runs per season belong to Greg, not his fielders.

If you look at his whole career with the 40 run haircut, Greg has had at most 16% of 450 team fielding runs support over 14 seasons of full-time pitching. That's five runs a season.

Together we may have put together the first reasonably good estimate of the upper bound for per-season fielding impact on a starting pitcher, though interactions between GB/FB pitching and the relative abilities of the team infield/outfield are not considered.


ABB# (November 24, 2003)

Discussion Thread

Posted 9:39 p.m., November 24, 2003 (#18) - Michael Humphreys
  Gleeman Production Average? ("GPA")

Aaron Hitting Average? ("AHA") (Ahah!)

Aaron's Batting Average ("ABA") (too lawyerly?)


Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 5:01 p.m., December 6, 2003 (#3) - Michael Humphreys
  Alan,

Perhaps I should let Tango answer, but I believe he was playing it straight--he "loves" the "intermediate" data at first base because it *disaggregates* the ratings into ground ball, line-drive, fly ball and pop-up ratings (or something like that).

I've written to David that disaggregation might be very helpful at the other infield positions. Grounders are skill plays, line-drives might be subject to a lot of luck, and flyball/popups are subject to ball-hogging. David's data could provide excellent evidence for the degree to line-drive "range" is luck, or pop-up range is ball-hogging.

These factors may explain why the *aggregate* rating for Chavez at third is only average, and why Jeter and Soriano don't rate so badly--my guess is that Jeter and Soriano knew that Bernie didn't have the range and compensated by aggressively using their speed to go after short flies in center. Having written it that way, I'm making it sound like a good thing, and I suppose it is, but I wonder how much of it was just taking chances that Bernie (or an average centerfielder) might take anyway.


Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 12:42 p.m., December 8, 2003 (#10) - Michael Humphreys
  Tango,

Fantastic analysis. And the result makes a lot of sense. Let's face it--there simply is not the same "shortage" of adequate fielders as there is for adequate hitters. The talent distribution that Bill James brought to everyone's attention is not nearly as skewed for fielding as it is for hitting.

Having said that, it is interesting, and in a way consistent with the James's Defensive Spectrum, that there may be a slight "shortage" of fielding skill at SS and CF (though you'd expect the 2B and 3B results to be reversed).

By the way, DRA ratings in the 1974-2001 study (which only covers full-time players) were on average somewhere between 0 and +3. Can't remember offhand--but they were only very, very slightly better than league average.


Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 3:10 p.m., December 8, 2003 (#13) - Michael Humphreys
  Beyond fantastic.

I'm relieved that the numbers match up pretty well with DRA. The standard deviation in runs saved at almost all the positions (except first base) was about a dozen, not ten,--so there's still a little too much variation, but not much.

The max of +30 runs is also consistent with DRA. The only exceptions in the DRA sample (beyond a handful of "fluke" single-season values) appear to be Andruw (the ball hog, no doubt, at least from 1998-2001) and Barfield, whose range number is about +20 in his (three or four consecutive) good years, but whose arm ratings take him well above that, to about +30.


Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 6:20 p.m., January 15, 2004 (#21) - Michael Humphreys
  Silver King,

Post your e-mail and I'll send you the results. The editors at baseballprimer have apparently been unable to get the full results posted. I'd put everything into a simple Excel spreadsheet that I had hoped could just be linked without formatting (as has been done with several UZR files), but maybe there's some other technical difficulty.


Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 11:14 p.m., January 15, 2004 (#24) - Michael Humphreys
  Studes,

The book is coming along slowly because I'm having to cut and paste a lot of data from public sources (Retrosheet, etc.) so there won't be any restrictions on use.

Tango,

I'll send you the file.



Correlation between Baserunning and Basestealing (December 10, 2003)

Discussion Thread

Posted 12:00 p.m., December 10, 2003 (#1) - Michael Humphreys
  Tango,

This is a neat and surprising discovery that runs counter to twenty years of sabermetric wisdom. (We should think of a good name for this stat--how's about Quick-And-Dirty Baserunning Runs ("QAD-BR")). [Pronounced Quad-Bee-Are ;-)] I think where it might really have a big effect is in explaining Rickey Henderson's extraordinary career Win Advancements (based on that other guy's first attempt), which seem to imply that he was better than his Linear Weights value. As I mentioned in one of my posts to your Win Advancements thread, James got surprisingly high values for Rickey using sims--maybe as high or higher than for Mays. (And Mays would probably be someone else for whom the new .3*(SB-CS) metric might be revealing. His SB% was good, but not outstanding. But legend has it that he was an amazing baserunner.)

QAD-BR might also explain why when regressions are run on team offensive data, CS don't show up as bad as they are in isolation. CS "carry" information about non-stolen base baserunning effectiveness.

I suppose more analysis would be a good idea to determine whether the
regression result you obtained works at the extremes, such as Rickey. Or for notoriously "over" aggressive basestealers for whom we have Super-Linear Weights Baserunning Runs.

And I suppose the caveat to keep in mind is that, based on Win Advancements, basestealing and baserunning have, with the rarest exceptions, little practical effect. But the exceptions could be interesting.

Another thought--to refine the estimate for QAD-BR, might we consider
triples?


Correlation between Baserunning and Basestealing (December 10, 2003)

Discussion Thread

Posted 1:05 p.m., December 10, 2003 (#5) - Michael Humphreys
  Tango,

I used to think Mike Schmidt was the best post-Mays, pre-Bonds non-pitcher, but I'm starting to come around to Rickey. Aside from his outstanding OBP and record-shattering baserunning, he was also a very, very good fielder, based on his DRA ratings. I only have ratings for the nine seasons in which he played 130 or more games at one outfield position, but even just looking at those seasons, he saved over 100 runs in the field. Basically, Rickey was top-flight-to-outstanding at everything except power, and was above-average in power as well. When you start adding up it all up, he's been an incredible machine for winning games--particularly close games. If you're ahead by only a run in the ninth inning, you want a Rickey in the outfield; if you're behind by only a run in the ninth inning, I can't think of anybody else you'd want coming to the plate.

So adding the triples only increased the r from .63 to .65. Good thing to know--might as well keep things simple.

Not sure I understood the following: "Cutting the BR best-fit in 2, and you'd get something like BS+BR = .25*SB - .40*CS. We just have to be careful what we do." The being careful part I understand--what about the -.40 cost of SB for the 1980s?


Correlation between Baserunning and Basestealing (December 10, 2003)

Discussion Thread

Posted 12:31 p.m., December 11, 2003 (#21) - Michael Humphreys
  Tango,

Your last comment at you last post at fanhome was interesting. Paraphrasing a bit, when you look at the players with the highest gross number of SB+CS+3B, Super-LWTS Baserunning Runs drop just slightly if the SB success rate is average or below average. So there might be a slightly non-linear effect. Wonder whether there is a simple transformation of the data that would improve the fit.

Maybe the rule of thumb to use is take QAD-BR at face value if the SB success rate is at least average or slightly better; otherwise discount it ever so-slightly.

Also, and I'm sure you've done this, just lost track of it, QAD-BR projects total base*stealing* and base*running* runs in one number?


Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)

Discussion Thread

Posted 9:24 a.m., December 12, 2003 (#11) - Michael Humphreys
  MGL,

Looking forward to the UZR ratings, which are the single most valuable piece of sabermetric work every year. The Andruw rating is close to Pinto's. Diamond Mind appraisals are worth reading (despite the Range Factor stuff), but one really wishes they would just provide a number. Or even a range of numbers based on the various criteria used. Do they even say why they won't provide numbers?


Banner Years and Reverse Banner Years for a Player's BB rate (December 14, 2003)

Discussion Thread

Posted 11:05 a.m., December 14, 2003 (#1) - Michael Humphreys
  Important work. It provides the best evidence I'm aware of that it might be possible to teach a young hacker better plate discipline. That could have a big impact on scouting and player development. Perhaps further studies will help us better predict which kinds of young hackers have the most potential to learn how to draw walks.

One memorable example of a banner year occurred too early to be included in your study. Mays 1971. James wrote something in the '80s suggesting that such a sharp improvement in plate discipline for an old player (I think Mays was close to 40) might be a warning sign that he's lost bat speed and is trying to cope with that. Still, it made Mays a very valuable player in 1971.


Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 2:43 a.m., December 17, 2003 (#27) - Michael Humphreys
  This is one great thread.

I'm on board with the basic idea that replacement level is the appropriate measure (though it would be an enormous task actually to measure it, and require all sorts of subjective judgments, particularly for non-starting pitchers), that outstanding pitchers are obviously significantly undervalued (just look at Pedro's runs allowed--can't get any more direct than that), and that fielding ratings are way too compressed--Derek Jeter deserves fewer Win Shares; Mike Cameron more. Given these very obvious and fundamental mismeasures of value, one has to wonder whether it's worth refining the system.

Perhaps there is a way. And maybe what may help us find a way is thinking about how Win Shares manages to do a good job on *offense*.

We all know how to value offense reasonably well. And because we can, we're able to draw a very clear line as to what constitutes a "zero" value (.250?) hitter under Win Shares, so we can add up Runs Created per player above that line in such a way that they actually add up to runs scored by the team above the league-average-minus-50% line. Studes has some great graphs showing how well Win Shares works on offense.

It seems to me that if we're going to make Win Shares work on its own terms (more about the benefits of that), we need to find a way of explicitly and accurately *measuring* that ".250" line per pitcher (independent of fielding) and per fielding position (independent of pitching).

Somewhere in Win Shares (the book) James admits that he adjusted the fielding baselines to get more "variance" into fielder ratings. Maybe I missed it, but I don't believe that James anywhere says that "such-and-such combination of Ks, BBs and HRs (to use a DIPS approach) constitutes a .250 pitcher," or "such-and-such level of (context-adjusted) assists constitutes a .250 shortstop." He just sets baseline for fielders ("Subtract .200 from each Claim Percentage"--page 67) and looks to the pitcher's ERA.

Here's how we could do it. We need to examine--using DIPS, UZR, DRA, AED's system, David Pinto's system--what a team of .250 pitchers and fielders would actually look like, position-by-position, in terms of runs "allowed" relative to the league average. In other words, we need to find the lowest baseline at each position (based on actual levels of atrocious performance) that would cause the team collectively to give up 50% more runs than the average team, in the same way that we can (obviously far more easily) "construct" a team of ".250" hitters. (And by the way, offense and defense do in fact have approximately a 50/50 impact on wins.)

Now maybe there's no way of "mapping" such UZR, DRA or AED values "onto" pitching and fielding Win Shares. James believes it is impossible to context-adjust individual fielding statistics. So he first fixes fielding and pitching Win Shares at the team level before allocating them around using various 40/30/20/10 and other formulas. He also puts rigid limits on minimum and maximum team fielding win shares, which probably adds distortions.

Though the same "top-down" principle is applied to offense, it somehow seems different--as if he really calculates Runs Created per player *first* and *then* adjusts them to "fit" the number of runs the team actually scored.

But perhaps all we need to do is de-emphasise DPs and errors in allotting gross team fielding win shares, increase the weight for (park-adjusted) DER, and refine the weights James assigns for walks, strikeouts and homeruns allowed. And maybe we could try a DIPS approach that causes the pitchers' ratings to add up to the team SO/BB/HR rating. (Yes, I know pitchers have some effect on BIP. I have my own simple approach to this problem and there are others out there as well.)

Another complicating factor is that James effectively position-adjusts *offense* by adjusting *fielding* win shares. (See the cryptic reference to the "intrinsic weights in modern baseball" on page 67). He grants catchers a lot of fielding win shares to get their overall ratings up without grappling with the issue of whether catchers really add that kind of value *in the field*. Maybe what we could do is simply ignore the "intrinsic weights" in actually trying to measure ".250" fielding and then add/subtract explicit position adjustments.

Now you may ask, "Why the hell should we go through all the effort to measure whatever ".250" fielding and pitching is, when we could go straight to a measure of *replacement level" performance? After all, .250 is not replacement level--it's far below it. And the problem is further compounded in that we're effectively adding greater-than-.250 value on both offense and fielding, which happens with Davenport's system. The resulting system will probably still accord somewhere between 5 and 10 Win Shares to a full-time player who adds no "major league value", i.e., value above replacement players available in the minor leagues.

Two answers. First, it's not clear what the replacement level is--.350? .425? To some extent, as James so aptly says somewhere (though not in connection with replacement value), "You just have to pick a number." At least when you're comparing full-time players, you'll still probably come out with marginal differences in Win Shares that reflect real differences in contributions to wins.

Second, the Win Shares system needs that extremely low baseline to prevent wins-per-run distortions when teams out- or under-perform their Pythagorean projection. I think you can guess how the math works without an example, but think about it this way--if we used a .425 line, and the team outperformed its Pythagorean projection by 5 wins or so (which happens all the time), you'd get some huge win shares for the marginal runs created or saved. Even using the .250 baseline, you get some fairly big distortions--think of the 1969 Mets players. Cleon Jones was not a 30-Win-Share player, even in 1969.

But I like the idea that player Win Shares actually add up to team Wins (multiplied by three). A more "accurate" replacement level system would probably have to forego that nice feature.

Maybe all of this is too much effort to make a system work that is based on a fundamental contradiction between marginal and absolute value, and that is unbelievably complicated and prone to clear errors. Nevertheless, I'm pretty confident that if we actually measured that .250 line for fielders and pitchers we'd see a lot more Greg Luzinskis "zeroed out" in the field (as they should be and currently aren't), which would free up more Win Shares for strong pitchers and fielders. And we wouldn't have to resort to mathematics dating from the Middle Ages (pace Fibonacci) to get the values to make sense.


Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 3:28 a.m., December 17, 2003 (#28) - Michael Humphreys
  Sorry for the long post and for failing to make the most important point explicit (I'm recovering from a brutal exam and my brain is fried).

The whole idea of the above is that we need to measure actual ".250" fielding and pitching (independent of each other) and give credit for value (claim points, runs saved, whatever) above that level. James just posits "floors" without explanation, and the floors selected result in terrible fielders getting defensive win shares that belong to better fielders and to pitchers. If that were fixed, we could go with the 50/50 offense/defense split that actually reflects reality and get the pitcher ratings up.


UZR 2003 Previews (December 18, 2003)

Discussion Thread

Posted 7:15 p.m., December 18, 2003 (#14) - Michael Humphreys
  MGL--thanks for the preview. Per David's point re: Aramis Ramirez's error rates, have you given any more thought to how the "errors" and "range" components of UZR work together? In one of the DRA threads, I think you mentioned that the "errors" factor led to some significant variances in shortstop ratings--Rey Ordonez in particular.

Tango's recent posting of Custom Linear Weights reminded me that errors are only .02 runs worse than allowing a single. Though there might be a non-linear impact to having atrocious error rates, the run-weight for errors suggest that, given two fielders with equal "range" (plays successfully made out of total BIP in his zones), someone would have to reach (and flub) 50 extra BIP to hurt his team by only 1 run. Or, to take a Mike Bordick type of example, a player who *effectively* has league-average plays made given zone opportunities by *avoiding* making the 25 or so errors typically made by an average shortstop would be only .5 runs better than the guy with average pure range and surehandedness.

Have you thought of generating range-only infielder ratings to see how they might look? Aramis might come out OK.

Regarding moving around fielders: since Biggio just doesn't have the range to handle CF, has anyone thought of moving him back to catcher, where he began his career?


UZR 2003 Previews (December 18, 2003)

Discussion Thread

Posted 2:46 p.m., December 19, 2003 (#34) - Michael Humphreys
  MGL, I think I got the range/errors point down after re-reading your articles this past spring. Basically, you treat each ROE as a play made in the zone when calculating the number of plays made above or below the league average rate in the zone. Then you calculate the ROE above or below the league average rate given the number of BIP chances taken. Since ROE have virtually the same weight as hits allowed (at least in the infield), the total runs should come out to almost exactly the same as plays made given total BIP chances in the zone. But it's probably worth disaggregating if the data is available anyway. Not sure what the "denominator" is for non-ROE errors, but they're so small that it probably doesn't matter much.

Have you ever thought about the idea of Biggio moving back to catcher, now that his range is too poor for CF and 2B? His making the move from catcher to 2B was extraordinary; how much more extraordinary would it be for him to move back? Is there a precedent for somebody moving back to catcher? Maybe Biggio's arm is too weak now anyway . . . .


Sabremetrics 301: Custom Linear Weights (December 18, 2003)

Discussion Thread

Posted 6:04 p.m., December 18, 2003 (#1) - Michael Humphreys
  Tango,

Thanks for posting this great resource.

How would one evaluate Pedro using the data? His high SO, low BB and low HR reduce his run environment down to the two-runs-per-game level. So should we credit him with the normal weights (because his performance *created* that run environment) or with the weights at the two-or-three-run level (because that is the *marginal* impact of each of his events in the run environment he has created)?


Valuing Starters and Relievers (December 27, 2003)

Discussion Thread

Posted 2:39 p.m., December 31, 2003 (#47) - Michael Humphreys
  Great article and thread. Continued research into both questions of pitcher evaluation--"ability" and "value" is worth pursuing.

At first, it will probably be easier to refine "value" measurements.

David and Guy are on the right track, though it might be worth considering creating *three* separate replacement levels: starters, closers and middle-relievers. (Or full-time "Average" Leverage, one/two-inning High Leverage and two/three-inning Low Leverage pitchers.)

Replacement value questions are (or should be) "practical" questions--and as a practical matter, no team will allow a replacement-level middle relief pitcher (the bottom-of-the-barrel mop-up guy) assume a "closer" role. Or at least not for a full season.

Determining a third replacement level (for middle-relievers) might also help in the evaluation of the Curt Shillings of the world. By going into the 8th inning, he creates value above the *middle-reliever* level.

Of course, once we start down this process, it leads to Tango's Win Advancement methodology.

Or does it? Tango, does Win Advancements measure pitcher win advancements against league average pitching *across ALL innings* or league-average pitching *during the inning (or inning/base/out) situation" in question? Perhaps sample size issues make the latter measure impractical.

Maybe another way to pose the question is whether separate *Run* Expectations are calculated for each INNING/SCORE/base/out situation, and *then* translated into Win Expectations, or whether Run Expectations are calculated using *all* innings, and then translated into Win Expectations based on the inning/score.

The former approach would indirectly cause a starter in the eight inning to be compared against middle relievers. If we could do that, we'd have a perfectly "granulated" pitcher evaluation system. That is, Curt Shilling's 7th inning Win Advancements would take into account that he is sparing his team the cost of going to middle-relievers, who are generally the worst pitchers on a team. (Of course, Win Advancements measure value over average, but I suppose replacement levels could be determined as well.)

Though probably an even more difficult problem, evaluating pitcher "ability" independent of usage context would be very useful, because it's almost certainly true that teams mismeasure "real" ability and badly misallocate pitching resources. (This in turn results in distorted *value* measurements, because the replacment-level talent pools and their impact on wins is out of whack, so value measurements vis-a-vis such data probably differ from "real" ability much more than happens with batters and fielders).

So really both "ability" and "value" research support each other and are both worth pursuing.


DRA Addendum (Excel) (January 16, 2004)

Discussion Thread

Posted 7:32 p.m., January 17, 2004 (#3) - Michael Humphreys
  Mills,

The "boxed" numbers are the best five consecutive ratings for the player. The average of those numbers is included in the Index.

MGL,

Thanks. I do take pains in the article, however, to explain why the Buckner rating is incorrect, possibly to a large extent. I agree, however, that he was not a bad fielder.

The data used is BFP, IP, SO, BB, HBP, H, HR (2b and 3b at league level), PO per position, Assists per position, E per position (though regressions suggest that only Errors at pitcher and right field matter), WP/PB/BK, SB allowed, DPs, Runs Allowed. Out of that data the GB/FB and L/RHP factors are calculated. The format of the equations is provided in the first installment and in a thread to the third installment.

That's a great idea to create linear weights equations using UZR data. You'll probably get a great fit. Then you can apply the equations to pre-UZR years. It should work very well.


DRA Addendum (Excel) (January 16, 2004)

Discussion Thread

Posted 10:21 a.m., January 19, 2004 (#10) - Michael Humphreys
  Tango, thanks for the PDF post. The file looks great and is much easier to read than the Excel spreadsheet.


DRA Addendum (Excel) (January 16, 2004)

Discussion Thread

Posted 10:16 p.m., January 20, 2004 (#16) - Michael Humphreys
  FJM,

Fair point. To use scientific terminology, it's rare that you can reject the null hypothesis that the fielder is average, even if you use the very weak p-stat test of .17 (one standard deviation).

But the same applies to UZR ratings, or nearly so, as far as I can tell, because the year-to-year "r" is about the same for UZR and DRA (approximately .5). It might be slightly higher for UZR, but not by much. Time will tell whether David Pinto's probability-based ratings have a higher year-to-year "r" than UZR.

I seem to recall some e-mail thread, perhaps at fanhome, regarding the construction of confidence intervals for a single season rating if you have a series of such ratings and the year-to-year "r" information. It might be the same thing you've done, FJM, but I seem to recall there being some interesting wrinkles. Tango, do you recall the thread?

It might be an interesting exercise to calculate the proportion of players who, based on their year-to-year *offensive* performance, were reliably better than average at *hitting*, using the one-standard-deviation test. Of course there would be at least several, but probably a lot fewer than you would expect.

It would also be interesting to calculate the Linear Runs from BABIP for hitters, to find the hitters who were reliably better than average in BABIP. The number of "significant" above-average BABIP performers would probably be much smaller than for overall batting. Nevertheless, fans routinely include BABIP value in overall offensive value. (Though, as MGL has pointed out, you need to regress that component significantly if you're trying to find a true "ability" measure.)

Ultimately, there is always an important distinction between statistical significance and practical significance. Even if DRA ratings lack the former, they may still possess the latter, given that, as far as I know, no other ratings have (meaningfully) greater statistical significance and that teams *must* make staffing decisions using the best information available. For purposes of rating minor league fielders, lacking UZR, it would be good to use DRA. And, as explained in the article, it's also good to have DRA as a "back-of-the-envelope" check on extreme UZR ratings.

Mike (Green),

I point out in the DRA article that one of the Glenn Hubbard ratings is clearly wrong. Aside from that one number, they seem consistent and not extreme. See the Dale Murphy discussion, which points out that the Atlanta team rating at CF went up when Murphy missed a lot of time (thus "controlling" for whatever mismeasure there is for GB/FB). I am very confident that Dale was a below average centerfielder, for a host of reasons provided in the article.

Any GB/FB adjustment could be wrong. Any rating could be wrong. DRA deals with averages and statistically significant relationships (even if, as FJM points out, the resulting ratings might not have "statistical significance").

Charles,

DRA does *not* use PBP data. As explained in the article, DRA can be made to work for all periods of baseball history, though with less confidence for outfielders, for whom we lack separate LF/CF/RF putout data. For all infielders (including catchers) DRA works exactly the same throughout major league history using only publicly available data. I've also been developing some neat ways of estimating LF/CF/RF putouts for pre-Retrosheet seasons, and have obtained some nifty results.

DRA differs meaningfully from DFT, based upon how DFT is described in Mike Emeigh's "Jeter" articles. Clay's website says that the DFT methodology has been updated (including by ignoring errors--maybe Clay read the DRA articles).

First, assuming DFT's GB/FB estimate is the same, it is not as accurate as DRA's. I know because I checked by using Clay's method and comparing the results with UZR. One consequence of this is that DFT outfielder ratings are much more compressed than DRA ratings (which are more compressed than UZR outfielder ratings). DFT almost certainly understates the value of Willie Mays and Tris Speaker. Take a look at their DFT ratings and you'll see what I mean.

Second, DFT gives credit for infielder pop-ups/fly outs. Diamond Mind, UZR and DRA don't. One consequence of this approach is that, eyeballing the data, DFT infielder ratings seem to have more year-to-year variability, though I haven't measured this. I still have not read a compelling case, including a full disclosure of methodology, for obtaining estimates, using traditional data, of infielder fly outs that reflect *skill*--not FB pitching, ball-hogging, and larger foul territories.

Third, DFT *forces* runs-saved ratings at each of the positions for the team to add up (with pitching runs) to the actual number of runs allowed. Now on one level, this is a good thing: the system is fundamentally accountable, as is Win Shares. But I think it may result in a small sacrifice in the accuracy of individual fielder ratings.

Why? Well, the analogy would be to how Bill James calculates *batting* Win Shares. Bill James applies the formulas for the individuals on a team, and if the total differs from the team runs scored, he proportionally adjusts every batters rating. But there is no basis for concluding that *each* player had a pro-rata impact on the relative efficiency or inefficiency of the lineup in converting the components of offense into runs. I would venture to guess that "unforced" offensive ratings have a higher year-to-year "r" and come closer to providing the true "value" measure for the player. It's probably not a big deal one way or the other, but I would imagine Clay's approach increases computational complexity.

DRA estimates of fielder and pitcher runs add up to an *estimate* of runs allowed, and such estimate is reasonably close, on average, to *actual* runs allowed, in the same way that regression analysis (or Linear Weights) estimates of team runs scored are close to actual runs scored. But there is no second adjustment to "force" a "perfect" fit.

Fourth, the DRA equations (the format of which is provided in the article) together constitute the simplest comprehensive system for ratings pitchers and fielders ever devised. I appreciate that you have to trust me on this to one degree or another. But try reading Mike's (well written) summary of DFT. Or Win Shares. In neither case can you reduce the calculations to a one-line equation per position involving only simple addition, subtraction, multiplication and division.

I agree that no system deserves an unqualified endorsement until it's been fully disclosed, everybody has the means to replicate its results, and it's possible to *test* those results against *more than one* alternative "correct" system. Not to get too high falutin' about it, that's how real science is conducted. Unfortunately, the only fan-implementable fielding systems out there that are fully disclosed are Win Shares and CAD. (The data for UZR and Pinto's systems is proprietary.) Win Shares simply does not provide ratings that match well with UZR or Diamond Mind. CAD may, I just don't know.

My ultimate goal in writing my book is to make the *full and complete* case for DRA being a rigorously tested, reasonably accurate, very simple, and conceptually coherent system that any fan can apply using publicly available data. Should I ever get the damn thing published, it will be the only system for evaluating fielders satisfying those criteria.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 11:39 p.m., February 2, 2004 (#3) - Michael Humphreys
  It was all new to me, well-written, and made a lot of sense. Getting on base would seem to be more in the batter's control than getting an extra-base hit. It would be interesting to know whether the batters who succeeded in improving their OBP in "pressure" situations lowered their slugging percentage at the same time (which would provide further evidence that the "clutch" batters made a conscious *adjustment* to avoid bad pitches and shorten their swings to make contact).


Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)

Discussion Thread

Posted 1:04 p.m., February 17, 2004 (#9) - Michael Humphreys
  Avkash, thanks for the great chart! And Tango, thanks for the link.

I've run the basic stats and correlations on the updated data, and have the following observations.

I agree with all of Tango's and MGL's points.

UZR is still the gold standard, though the outfield ratings probably have too much variance (Pinto's seem to have slightly too little).

I would consult UZR over Pinto in the infield, and consider *both* in the outfield, because "ball-hogging" is more of a problem in the outfield and UZR ratings seems to have more variance than Diamond Mind ratings.

For non-zone/PBP systems, Rate2 (Davenport Fielding Translations) is clearly an improvement over Win Shares, but DRA is probably more accurate (and definitely computationally simpler) than DFT.

***

I split the data between infield and outfield. Why? Because only UZR (correctly) eliminates infield fly outs--in other words, UZR is measuring something different from the others.

Here are the main findings in the infield:

UZR, Pinto, DFT all have approximately the same standard deviation (13, 15 and 12, respectively). Win Shares has a std of 6. (DRA standard deviations in the infield are about 12.) Furthermore, even using the updated numbers, the mean/median of Win Share infield ratings is 2 (or almost half of 1 std above zero); whereas the others have mean/medians of around +1. Win Shares undervalues defense and overvalues regulars.

Pinto has a .66 correlation with UZR. DFT has a .54 correlation with UZR. (AED measured a .61 correlation between DRA single-season infield ratings and UZR ratings provided in the DRA article.) Win Shares has a .42 correlation with UZR.

I have the feeling that if David (Pinto) eliminated fly outs from infield ratings, the correlation would shoot up to .8 or higher. For *non*-PBP systems, AED has developed a system that better measures "skill" infield fly outs, and including such plays actually improves his correlations with UZR ratings, which, again, do not include fly outs. This is probably because infield fly outs to some degree measure speed/range, and may indirectly adjust for the groundball/flyball tendency of the team's pitching staff.

In the outfield, the standard deviations were:

UZR (18), Pinto (11), DFT (10), Win Shares (5). (DRA, again, is about 12.)

The relevant correlations with UZR:

Pinto (.69); DFT (.47); Win Shares (.27)[!]. (The Win Shares correlation with Pinto is .42.) I don't have a DRA/UZR single-season ratings comparisons handy, but I don't recall the outfield numbers being appreciably worse than the infield numbers. (Actually, they were probably better, because there were more ratings "misses" in the infield (particularly at third, but also at first) than in the outfield, as shown in the DRA article.) So I would pencil in an approximately .6 to .65 correlation between DRA and UZR (or at least UZR ratings not inconsistent with Diamond Mind ratings).

My apologies for not getting out the DRA ratings for 2002 and 2003. I've been a bit overwhelmed by the academic program I'm in. I've also developed (or at least thought of) some new techniques to improve DRA ratings at third base and cope with the lack of separate LF/CF/RF data before 1970 or so.

Thanks again. As Tango would say, "Great stuff!"


Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)

Discussion Thread

Posted 1:44 p.m., February 17, 2004 (#12) - Michael Humphreys
  Avkash--thanks.

studes--WS "dampens" fielder ratings too much and, when you account for this, overrates regulars. The average WS rating for a regular is very high when you "scale" it for the std of WS ratings. Other ratings systems show regulars +1 or +1.5 runs/season, with std of 12-15; WS shows regulars +2 runs/season, with std of 6.

Everyone--a clarifying point. I think it's worthwhile to look at Pinto/UZR ratings in the outfield together because (i) the measurement of outfielder skill is complicated by the ball-hogging factor (so it's worth trying different techniques), and (ii) at least in the outfield Pinto and UZR are trying to measure the same thing using PBP data: the relative number of fly balls caught by an outfielder, compared to what an average outfielder would catch given the same batted ball "opportunities". Pinto and UZR measure the latter in different ways, but I think that's a good thing.


Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 12:47 a.m., March 2, 2004 (#1) - Michael Humphreys
  Tango,

I'm with you on this. Every baseball fan owes a big "thank you" to MGL for providing essentially the same quality information that major league teams have.

Zone ratings are not "enigmatic"; DFTs are.

The "explanation" of DFTs provided in the attached article explains nothing. (Tomorrow I'll check out the hard copy of Prospectus 2004 to see if any more information is provided there.)

Of course I have to admit that I haven't fully revealed the formulas for DRA, but I've certainly provided a lot more detail about the DRA system than BP had provided regarding DFTs. I've also gone to the trouble of testing DRA results against UZR and Diamond Mind evaluations.

Perhaps the reason BP tries not to talk about zone ratings is that BP might not be allowed to "sell" such ratings directly or indirectly through its website.

All carping aside, I am glad to have DFTs around--they're pretty good, the price is right, and they have ratings for every player who ever played the game.


Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 6:17 p.m., March 2, 2004 (#31) - Michael Humphreys
  DFTs were changed significantly after 2002, there are no public formulas for the current form of DFTs, and some of the data used in DFTs seems to be pbp data (though I could be wrong about the last point).

In short, DFTs are no longer reproducible by fans. We have to take them on faith.

They're still very worthwhile to have around. I would not describe DFT (or DRA, or CAD, for that matter) as "light years" inferior to UZR. As mentioned in the DRA article, 2-to-3 year DRA ratings had a .8 correlation with UZR, as well as the same scale. It is true that UZR is significantly better for part-time players, but I believe that Tango has recommended that people try not to evaluate players with less than 2 years of even UZR ratings (i.e., even full-year UZR ratings are subject to distortion, given the extreme amount of randomness in fielding performance).

I bought BP 2004 and will look at it in more detail, but based on a first read, the method is "slightly different" from 2003 (which was *significantly* changed from the 2002 method, and without a detailed explanation--in fact, I seem to recall the BP website *drawing attention* to the fact that the system had been changed significantly between 2002 and 2003, including by no longer treating errors as different from plays not made). BP 2004 says that the GB/FB method has been changed "radically" from 2003, as well as the LF/CF/RF split.

BP is now generating minor league DFTs, which is not surprising, because that's where they're most useful, particularly to major league teams, who don't have minor league zone data. That may also explain why BP is no longer providing a complete (or even a reasonably well-detailed) explanation of DFT.


Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 3:11 p.m., March 3, 2004 (#34) - Michael Humphreys
  I posted the following at Clutch:

The BP enigma deepens.

I have looked through all of BP projections for 2004 fielding runs above average.

Something like 95% of the projections are below average, i.e., negative.

It's true that there is a slight skew to fielding talent distributions, but it's nothing like that for batting. As Tango and others have shown, regulars are only a tiny bit better than average, on average, so even if you're including non-regulars in the sample, close to half should be above average.

It's also true that regulars age more quickly in fielding, so they all should be declining. The problem is that projections for young prospects are also included.

Not one player is projected to save 10 or more runs next year in the field. The closest is first baseman Todd Helton, with a +9 projection. One player I can't recall was +8. Eric Chavez was +7 at third. Mike Cameron was +6 in center. Certainly fewer than 10 players are projected to be +5 or more.

Because the ratings are biased negatively, there are quite a few (20?) that are -10 runs or worse. But the spread is still too narrow. Jeter is projected for -10; Bernie Williams for -5. Both have clearly established, on a consistent basis over a few seasons, that they give up around 20 runs per 162 games. And they're both older.

MGL has taught us to regress fielding ratings considerably, but I very much doubt that MGL would not project a single player to be +10 or more in the field.

BP says that fielding has never been less relevant than it is today, and I agree. In the DRA article, I note that DRA ratings, particularly at short, are converging towards the average. But BP projections essentially imply that fielding is virtually valueless. That is not consistent with UZR, Diamond Mind, Pinto or DRA.

To be clear, there are a few single season ratings shown in BP 2004 that are significant--+21 for Mike Cameron last year. But the spread for even actual (not projected) ratings still seems much smaller than for Pinto and *very much* smaller than for UZR.

Curiosity got the better of me. I decided to look for all +15 seasons reported in the book, which provides 2001, 2002 and 2003 data for each player. There were only 32 such seasons. I chose +15 because even if you regress it by 50%, you still have something meaningful.

I believe only four players in the entire book have had even two +15 seasons from 2001-03. None had three. Andruw Jones was +18 in '01 and '02. Adam Kennedy was +17 in '02 and +15 in '03. Eric Chavez was +20 in '01, +25 in '03. He was, I believe, the only player with two +20 seasons. In '02 he was +3. I think there was some catcher who might have been the last one. If I missed one or two, it's only one or two.

The highest single season rating I recall was +26. For first baseman Todd Helton, in 2003. Nope, sorry about that--Rey Sanchez was +27 at short 2001.

Their seems to be some skew in historical ratings--there are more negative ratings, and the negative ratings are probably "bigger" than the positive ratings I've reported above.

Though I'd have to double check by looking at the negative ratings, the data summarized above strongly suggests that the most current BP single season ratings for the last three years have very much smaller variance than UZR, Diamond Mnd, Pinto or DRA ratings. (Diamond Mind doesn't provide runs-saved numbers, but in their essay on fielding, they mention the "spread" of plays made above or below average).


Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 6:44 p.m., March 3, 2004 (#36) - Michael Humphreys
  Silver King,

I myself have noted that it's nice to have BP fielding ratings for free going all the way back. In general they match up reasonably well with DRA ratings from 1974-2001, though the BP outfield ratings (at least what's up on the website) have too little variance (look at Tris Speaker compared to Cobb) and the infield ratings seem to have more "noise" (but roughly the right variance), because they include pop-ups (which Diamond Mind, UZR, and DRA ignore for infielder ratings).

But the reason I liked the ratings so much was that I thought I knew how they were calculated. The more I've looked into it, the more completely mysterious they've become. I'm still inclined to think they're better than anything else out there for pre-UZR periods, but . . . .

The variance in defensive performance in the 2004 book seems *way* more compressed--more than what I was used to seeing on-line and more so than the two best PBP systems: MGL's UZR and Pinto's Probabilistic Model of Range.

In a prior Primate Studies thread based on a comparison of fielding systems the standard deviation of Pinto's system was about 12 runs (I think) and of UZR about 15. (DRA is about 12.) I thought Tango had also done some simulation studies suggesting an average standard deviation per position of about 12 to 15 runs. Win Shares is around 6.

If fewer than 1% of player seasons (32 out of how many?) in the most recent book have a positive rating of greater than +15, it is almost a certainty that the new BP ratings have a standard deviation closer to Win Shares and in any case clearly different from the two best "zone"-type systems I know of.

And, if you'll pardon my repeating myself, it seems very strange to trumpet your fielding system if it (i) essentially shows that fielding hasn't any persistent material effect and (ii) almost all fielders are below average.


Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 8:38 a.m., March 5, 2004 (#38) - Michael Humphreys
  Studes,

Thanks for the comment. In the 2003 on-line article about DFTs (which I can no longer find), Clay said that errors were being de-emphasized and did not say that DPs would be emphasized more, so range is pretty much all that's left (as it should be, except at catcher and first base). I think the article emphasized that a "top down" accounting system (similar to Win Shares) would be used. Perhaps a hard "cap" on *team* fielding runs and a too-high "floor" for each *position's* fielding runs results in too little variance, as happens with Win Shares.

A few new thoughts:

It's odd that none of the posters to the "BP-shouldn't-call-Zone-Ratings-Enigmatic-and ignore-MGL's-contributions" threads here or at Clutch appears to be interested in the "enigma" of *BP's* defensive rating methodology *and* results. (Though PECOTA attracted some comment at Clutch.)

After looking at everything I've been able to find at BP's website (not including Premium) and this year's hardcover book, it has occurred to me that BP nowhere claims to rely exclusively on traditional statistics--i.e., they never say they *don't* use STATS-type data.

Their claim that "front offices have access to much more advanced metrics than the public, some specifically charting where each batted ball is hit, how hard, and how high" is in some sense true. Though the STATS data MGL acquires has zone, speed and some trajectory data (classifications of line-drives, fly balls, pop-ups, etc.), baseball teams have even *more* detailed data. Recall in Moneyball that a more advanced version of STATS data was sold by a company called AVM (I think) to the A's (then replicated by budget-conscious DePodesta). The "granulation" of AVM data seemed way beyond what STATS provides--in fact, it had to be, or else no team would have ever bought it. (By the way, though more granulated data is better than less granulated data, MGL's data is well more than adequate to yield good ratings.)

It may not be coincidental that both Diamond Mind and BP are more "enigmatic" than MGL or Pinto. (Diamond Mind doesn't provide actual runs-saved estimates, only "letter"-type grades.) As profit-making concerns, they may be restricted contractually from effectively "reselling" STATS data.

Anyway, to recap: (i) we don't know what BP's defensive formulas are, (ii) we don't even know what defensive data BP uses, though it seems to me likely they use STATS data, (iii) the resulting ratings "skew" weirdly negative, (iv) the variance in ratings appears to be much lower than estimates provided by Tango, MGL, Pinto and DRA, and (v) the projections for 2004 indicate that virtually no fielder has a meaningful positive impact on run prevention.



Copyright notice

Comments on this page were made by person(s) with the same handle, in various comments areas, following Tangotiger © material, on Baseball Primer. All content on this page remain the sole copyright of the author of those comments.

If you are the author, and you wish to have these comments removed from this site, please send me an email (tangotiger@yahoo.com), along with (1) the URL of this page, and (2) a statement that you are in fact the author of all comments on this page, and I will promptly remove them.